Geostatistical Modelling using R-INLA

Author

Afiq Amsyar

R-INLA

INLA is implemented as an R package called INLA, although this package is also called R-INLA. The package is not available from the main R repository CRAN, but from a specific repository at http://www.r-inla.org. The INLA package is available for Windows, Mac OS X and Linux, and there are a stable and a testing version.

A simple way to install the stable version is shown below. For the testing version, simply replace stable by testing when setting the repository.

options(timeout = 600)  # Increase timeout to 600 seconds
install.packages("INLA", repos=c(getOption("repos"), INLA="https://inla.r-inla-download.org/R/stable"), dep=TRUE)

The main function in the package is inla(), which is the one used to fit the Bayesian models using the INLA methodology. This function works in a similar way as function glm() (for generalized linear models) or gam() (for generalized additive models). A formula is used to specify the model to be fitted and it can include a mix of fixed and other effects conveniently specified.

Specific (random) effects are specified using the f() function. This includes an index to map the effect to the observations, the type of effect, additional parameters and the priors on the hyperparameters of the effect. When including a random effect in the model, not all of these options need to be specified.

1 Load Packages

library(pacman) 
Warning: package 'pacman' was built under R version 4.4.1
p_load(sf, rgeos, rgdal, dplyr, tmap, leaflet, tidyverse, raster,lwgeom, corrplot,viridis,mice)
Installing package into 'C:/Users/ACER/AppData/Local/R/win-library/4.4'
(as 'lib' is unspecified)
Warning: package 'rgeos' is not available for this version of R

A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
Warning: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4:
  cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4/PACKAGES'
Warning in p_install(package, character.only = TRUE, ...):
Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
logical.return = TRUE, : there is no package called 'rgeos'
Installing package into 'C:/Users/ACER/AppData/Local/R/win-library/4.4'
(as 'lib' is unspecified)
Warning: package 'rgdal' is not available for this version of R

A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
Warning: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4:
  cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4/PACKAGES'
Warning in p_install(package, character.only = TRUE, ...):
Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
logical.return = TRUE, : there is no package called 'rgdal'
Warning in p_load(sf, rgeos, rgdal, dplyr, tmap, leaflet, tidyverse, raster, : Failed to install/load:
rgeos, rgdal
library(INLA)
Warning: package 'INLA' was built under R version 4.4.1
Loading required package: Matrix

Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':

    expand, pack, unpack
This is INLA_24.06.27 built 2024-06-27 02:36:04 UTC.
 - See www.r-inla.org/contact-us for how to get help.
 - List available models/likelihoods/etc with inla.list.models()
 - Use inla.doc(<NAME>) to access documentation
library(readxl)

2 Import Polygon

# with mukim 
Kel <- st_read("kelantan.shp")
Reading layer `kelantan' from data source 
  `C:\Users\ACER\Desktop\inla\kelantan.shp' using driver `ESRI Shapefile'
Simple feature collection with 66 features and 6 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 371629.6 ymin: 503028.2 xmax: 519479.6 ymax: 690232.8
Projected CRS: Kertau (RSO) / RSO Malaya (m)
plot(Kel)

2.1 Insert projected coordinates reference system

st_crs(Kel) <- 3168 
head(Kel)
Simple feature collection with 6 features and 6 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 476592.2 ymin: 654046.7 xmax: 493907.6 ymax: 682355.5
Projected CRS: Kertau (RSO) / RSO Malaya (m)
    NEGERI DAERAH                 MUKIM LELAKI PEREMPUAN JUM_JANTIN
1 KELANTAN BACHOK                BEKLAM   4859      4813       9672
2 KELANTAN BACHOK GUNONG (GUNONG TIMOR)  11100     10884      21984
3 KELANTAN BACHOK              MAHLIGAI   4564      4600       9164
4 KELANTAN BACHOK               PERUPOK   8777      8614      17391
5 KELANTAN BACHOK        MELAWI (REPEK)   9227      8672      17899
6 KELANTAN BACHOK      TAWANG (MENTUAN)  14140     13632      27772
                        geometry
1 POLYGON ((485501.8 669698.8...
2 POLYGON ((487716.5 665649.5...
3 POLYGON ((482744.8 660223.4...
4 POLYGON ((486936.9 677358.5...
5 POLYGON ((490841.7 668783.4...
6 POLYGON ((486910.8 677819.3...

2.2 Obtain centroid of polygon

# obtain centroid of polygon 
centroid <- Kel %>% mutate(lon = map_dbl(geometry, ~st_centroid(.x)[[1]]), 
                           lat = map_dbl(geometry, ~st_centroid(.x)[[2]]))
# Select only woking variable 
centroid <- centroid[,c(3,6,8,9)]

3 Leptospirosis Data

3.1 Import Case Data

dat <- read_excel ("lepto.xlsx")
names(dat)
 [1] "Diagnosis"              "Tahun Daftar"           "Epid Daftar"           
 [4] "Age"                    "Alamat semasa/kejadian" "Poskod"                
 [7] "lat"                    "long"                   "Notifikasi Status"     
[10] "Race"                   "Kewarganegaraan"        "Gender"                
[13] "Nationality"            "Klasifikasi Kes"        "Cara Pengesanan Kes"   
[16] "Jenis Rawatan"          "Daerah"                 "MUKIM"                 
[19] "Negeri"                
# convert raw data into aggregated data by sub-district 
aggregate1 <- group_by(dat, MUKIM) %>% 
  summarize( cases = n() ) 
head(aggregate1)
# A tibble: 6 × 2
  MUKIM                      cases
  <chr>                      <int>
1 ALOR PASIR                    11
2 BADANG                         5
3 BANGGU                         3
4 BATU MELINTANG (BELIMBING)    12
5 BATU MENGKEBANG               89
6 BEKLAM                         1

3.2 Merging data between main data and centroid

# merge the data 
main.dat <- merge(aggregate1, centroid, by.x = "MUKIM",by.y = "MUKIM") 
names(main.dat)
[1] "MUKIM"      "cases"      "JUM_JANTIN" "lon"        "lat"       
[6] "geometry"  
# remove geometry 
main.dat <- main.dat[,-c(6)]

# create spatial point using kertau 
dat.kertau <- SpatialPoints(main.dat[, c("lon", "lat")], proj4string = CRS("+proj=omerc +no_uoff +lat_0=4 +lonc=102.25 +alpha=323.0257905 +gamma=323.130102361111 +k=0.99984 +x_0=804670.24 +y_0=0 +ellps=evrst69 +units=m +no_defs"))

# convert CRS to WGS84 
dat.WGS84 <- spTransform(dat.kertau, CRS("+proj=longlat +datum=WGS84")) 

# add longlat variable into main data frame 
main.dat[, c("lat", "lon")] <- coordinates(dat.WGS84) 
head(main.dat)
                       MUKIM cases JUM_JANTIN      lon      lat
1                 ALOR PASIR    11      10112 6.057231 102.0637
2                     BADANG     5      35957 6.184074 102.2579
3                     BANGGU     3      23049 6.049841 102.3182
4 BATU MELINTANG (BELIMBING)    12       8456 5.674871 101.7198
5            BATU MENGKEBANG    89      63575 5.551418 102.2425
6                     BEKLAM     1       9672 6.035219 102.3482

3.3 Calculate of prevalence by sub-district

# calculate incidence of cases per 100000 population 
main.dat <- main.dat %>% mutate(incidence = (cases / JUM_JANTIN) * 100000)

4 Describing spatial data

4.1 Map the incidence

main.data.sf.merge <- merge(Kel, main.dat, by.x="MUKIM", by.y="MUKIM") 
tmap_mode("view")
tmap mode set to interactive viewing
tm_shape(main.data.sf.merge) + tm_borders("grey25", alpha=.2) +
  tm_shape(main.data.sf.merge) + 
  tm_borders("grey25", alpha=.5)+ 
  tm_fill("incidence", palette = "YlOrRd", style="cont", n=7,alpha=.6, fill.title = "", title = "Incidence of Leptospirosis Cases")

4.2 Map the incidence by Centroid of Sub-districts

# create 4 color representing prevalence of cases
pal <- colorBin("viridis", bins = c(0, 100, 200, 300, 400)) 

#map the coordinate 
leaflet(main.dat) %>% 
  addProviderTiles(providers$CartoDB.Positron) %>% 
  addCircles(lng = ~lat, lat = ~lon, color = ~pal(incidence)) %>% 
  addLegend("bottomright", pal = pal, values = ~ incidence, title = "Incidence", labFormat = labelFormat(digits = 5)) %>% 
  addScaleBar(position = c("bottomleft"))

5 Import environmental covariate

5.1 Get tPolygon Map to crop the

Malaysia <- st_read("mys_admbnda_adm1_unhcr_20210211.shp")
Reading layer `mys_admbnda_adm1_unhcr_20210211' from data source 
  `C:\Users\ACER\Desktop\inla\mys_admbnda_adm1_unhcr_20210211.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 16 features and 9 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 99.64072 ymin: 0.855001 xmax: 119.2697 ymax: 7.380556
Geodetic CRS:  WGS 84
Kel.nomukim <- subset(Malaysia, ADM1_EN=='Kelantan')
Kel.WGS84 <- st_transform(Kel.nomukim, crs = 4326) # convert CRS from Kertau to WGS84 
Kel.WGS84
Simple feature collection with 1 feature and 9 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 101.3369 ymin: 4.549111 xmax: 102.6678 ymax: 6.245808
Geodetic CRS:  WGS 84
   ADM1_EN ADM1_PCODE  ADM0_EN ADM0_PCODE       date    validOn    validTo
3 Kelantan       MY03 Malaysia         MY 2020-12-02 2021-02-11 -001-11-30
  Shape_Leng Shape_Area                       geometry
3   7.185704   1.234524 MULTIPOLYGON (((102.174 6.2...

5.2 Altitude

library(geodata)
Warning: package 'geodata' was built under R version 4.4.1
Loading required package: terra
Warning: package 'terra' was built under R version 4.4.1
terra 1.7.78

Attaching package: 'terra'
The following object is masked from 'package:tidyr':

    extract
r <- elevation_30s(country = "MYS", path = tempdir())

5.3 Crop Altitude for Kelantan

altitude.r.crop <- crop(r, Kel.WGS84)
# remove raster value outside polygon 
altitude.r.crop <- mask(r, Kel.WGS84)

5.4 Plot Altitude for Kelantan

#plot the raster 
tmap_mode("view")
tmap mode set to interactive viewing
tm_shape(altitude.r.crop) +
  tm_raster(style= "quantile", n = 7 , palette = viridis(7, direction = 1) ,alpha = .5,title = "Altitude (in meter)")
stars object downsampled to 1710 by 584 cells. See tm_shape manual (argument raster.downsample)

5.5 Add covariate (elevation) into main data sets

main.dat$alt <- raster::extract(r, main.dat[, c("lat", "lon")])$MYS_elv_msk 
head(main.dat)
                       MUKIM cases JUM_JANTIN      lon      lat incidence alt
1                 ALOR PASIR    11      10112 6.057231 102.0637 108.78165  10
2                     BADANG     5      35957 6.184074 102.2579  13.90550  11
3                     BANGGU     3      23049 6.049841 102.3182  13.01575  12
4 BATU MELINTANG (BELIMBING)    12       8456 5.674871 101.7198 141.91107 293
5            BATU MENGKEBANG    89      63575 5.551418 102.2425 139.99214  45
6                     BEKLAM     1       9672 6.035219 102.3482  10.33912   6

6 Modeling

Here we specify the model to predict Leptospirosis in Kelantan and detail the steps to fit the model using the SPDE approach and the R-INLA package.

6.1 Mesh construction

# extract the boundary of Kelantan from sf-class object to sp-class 
kel.bdry <- as(Kel.WGS84, "Spatial") %>% INLA::inla.sp2segment()
# create the mesh and supply the boundary argument 
coo <- cbind(main.dat$lat,main.dat$lon) 
mesh <- INLA::inla.mesh.2d(boundary = kel.bdry,loc = coo, max.edge = c(0.1, 5), cutoff = 0.01)
# No. of mesh created 
mesh$n
[1] 1576
plot(mesh) 
points(coo, col = "red")

6.2 Build SPDE model on the mesh

spde <- INLA::inla.spde2.matern(mesh = mesh, alpha = 2)

6.3 Index set

Now we generate the index set for the SPDE model. We do this with the function inla.spde.make.index() where we specify the name of the effect (s) and the number of vertices in the SPDE model (spde$n.spde). This creates a list with vector s equal to 1:spde$n.spde, and vectors s.group and s.repl that have all elements equal to 1s and size given by the number of mesh vertices.

indexs <- INLA::inla.spde.make.index("s", spde$n.spde) 
lengths(indexs)
      s s.group  s.repl 
   1576    1576    1576 

6.4 Projection matrix

We need to build a projector matrix A that projects the spatially continuous Gaussian random field at the mesh nodes. The projector matrix A can be built with the inla.spde.make.A() function passing the mesh and the coordinates.

A <- INLA::inla.spde.make.A(mesh = mesh, loc = coo)

6.5 Prediction data

Here we specify the locations where we wish to predict the prevalence. We set the prediction locations to the locations of the raster of the covariate elevation. We can get the coordinates of the raster r with the function crds() of the terra package.

library(terra)
dp <- terra::crds(altitude.r.crop)
dim(dp)
[1] 18150     2

In this example, we use fewer prediction points so the computation is faster. We can lower the resolution of the raster by using the aggregate() function of terra. The arguments of the function are

  • x: raster object,

  • fact: aggregation factor expressed as number of cells in each direction (horizontally and vertically),

  • fun: function used to aggregate values.

We specify fact = 5 to aggregate 5 cells in each direction, and fun = mean to compute the mean of the cell values.

ra <- aggregate(altitude.r.crop, fact = 5, fun = mean)
dp <- terra::crds(ra)
dp <- as.data.frame(dp)
dim(dp)
[1] 652   2

We call coop to the matrix of coordinates with the prediction locations. We add to the prediction data dp a column cov with the elevation values for the prediction coordinates.

coop <- terra::crds(ra)
dp$cov <- extract(ra, coop)[, 1]

6.6 Projector matrix

We also construct the matrix that projects the spatially continuous Gaussian random field from the prediction locations to the mesh nodes.

Ap <- inla.spde.make.A(mesh = mesh, loc = coop)

6.7 Stack data for the estimation and prediction

# stack for estimation stk.e
stk.e <- inla.stack(
  tag = "est",
  data = list(y = main.dat$cases, numtrials = main.dat$JUM_JANTIN),
  A = list(1, A),
  effects = list(data.frame(b0 = 1, elevation = main.dat$alt), s = indexs)
)

# stack for prediction stk.p
stk.p <- inla.stack(
  tag = "pred",
  data = list(y = NA, numtrials = NA),
  A = list(1, Ap),
  effects = list(data.frame(b0 = 1, elevation = dp$cov),
    s = indexs
  )
)

# stk.full has stk.e and stk.p
stk.full <- inla.stack(stk.e, stk.p)

6.8 Model formula

We specify the model formula by including the response in the left-hand side, and the fixed and random effects in the right-hand side. In the formula, we remove the intercept (adding 0) and add it as a covariate term (adding b0), so all the covariate terms can be captured in the projection matrix.

formula <- y ~ 0 + b0 + elevation + f(s, model = spde)

6.9 inla() call

We fit the model by calling inla() and using the default priors in R-INLA. We specify the formula, family, data, and options. In control.predictor we set compute = TRUE to compute the posteriors of the predictions. We set link=1 to compute the fitted values (res$summary.fitted.values and res$marginals.fitted.values) with the same link function as the family specified in the model. We also add control.compute = list(return.marginals.predictor = TRUE) to obtain the marginals.

res <- inla(formula,
  family = "binomial", Ntrials = numtrials,
  control.family = list(link = "logit"),
  data = inla.stack.data(stk.full),
  control.predictor = list(compute = TRUE, link = 1,
                           A = inla.stack.A(stk.full)),
  control.compute = list(return.marginals.predictor = TRUE)
)
summary(res)
Time used:
    Pre = 0.473, Running = 2.09, Post = 0.126, Total = 2.68 
Fixed effects:
            mean    sd 0.025quant 0.5quant 0.975quant   mode   kld
b0        -6.646 1.054     -8.680   -6.697     -4.262 -6.677 0.057
elevation  0.000 0.001     -0.002    0.000      0.002  0.000 0.000

Random effects:
  Name    Model
    s SPDE2 model

Model hyperparameters:
              mean    sd 0.025quant 0.5quant 0.975quant  mode
Theta1 for s -2.45 0.223     -2.887    -2.46      -2.01 -2.46
Theta2 for s  1.34 0.380      0.568     1.35       2.06  1.38

Marginal log-Likelihood:  -222.10 
 is computed 
Posterior summaries for the linear predictor and the fitted values are computed
(Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

7 Mapping the ouput

Now we map the Leptospirosis prevalence predictions using leaflet. The mean prevalence and lower and upper limits of 95% credible intervals are in the data frame res$summary.fitted.values. The rows of res$summary.fitted.values that correspond to the prediction locations can be obtained by selecting the indices of the stack stk.full that are tagged with tag = "pred". We can obtain these indices by using inla.stack.index() passing stk.full and tag = "pred".

index <- inla.stack.index(stack = stk.full, tag = "pred")$data

We create vectors with the mean prevalence and lower and upper limits of 95% credible intervals with the values of the columns "mean""0.025quant" and "0.975quant" and the rows given by index.

prev_mean <- res$summary.fitted.values[index, "mean"]
prev_ll <- res$summary.fitted.values[index, "0.025quant"]
prev_ul <- res$summary.fitted.values[index, "0.975quant"]

7.1 Mapping predicted mean incidence using raster

# Create SpatVector object
sv <- terra::vect(coop, atts = data.frame(prev_mean = prev_mean,
                               prev_ll = prev_ll, prev_ul = prev_ul),
                  crs = "+proj=longlat +datum=WGS84")

# rasterize
r_prev_mean <- terra::rasterize(
  x = sv, y = ra, field = "prev_mean",
  fun = mean
)
pal <- colorNumeric("viridis", c(0, 1), na.color = "transparent")

leaflet() %>%
  addProviderTiles(providers$CartoDB.Positron) %>%
  addRasterImage(r_prev_mean, colors = pal, opacity = 0.5) %>%
  addLegend("bottomright",
    pal = pal,
    values = values(r_prev_mean), title = "Incidence"
  ) %>%
  addScaleBar(position = c("bottomleft"))

We can follow the same approach to create maps with the lower and upper limits of the prevalece estimates. First we create rasters with the lower and upper limits of the prevalences. Then we make the map with the same palette function we used to plot the mean prevalence.